This section details how to use the GEGVICshine app. The same information can be located in the tab Getting started within the app.
Within GEGVICshine, the user is guided through the different steps necessary to perform all the analyses. This includes the uploading of the input files and all the tuning parameters that can be modified to personalise the outputs. This vignette explores all the options present in the app and shows how to use them properly.
This tab is where you can find the present manual page. And information about the project.
Parameters is the default tab that appears when opening the app and is the section where the user can upload the input files and define the rest of the parameters needed to perform all the analyses. In total there are four sub-tabs in this section and the user must fill in the indicated parameters in each case and click the button to access the next sub-tab.
In all of the cases, for each parameter has a tag (between brackets) that indicates in which modules it is necessary. The codes are: GE for the GE_module (Gene Expression), GV for the GV_module (Genetic Variations) or IC for the IC_module (Immune cell Composition).
The parameters sub-tabs are:
Click to select which of the modules (either one, two or all three) will be included in the analysis.
There is an additional button to download sample input data. These can be used to reproduce the results from this manual.
Here the user need to upload the necessary input files:
All files must be .csv files (comma separated values) except for the gene sets that must be in the form of a .gmt file. The format of each specific file should be as follows:
Genetic variations: Table containing short variant calls. Necessary columns MUST have the following names (following the MAF format: https://docs.gdc.cancer.gov/Data/File_Formats/MAF _Format/):
– Hugo_Symbol: Gene symbol from HGNC.
– Chromosome: Affected chromosome.
– Start_Position: Mutation start coordinate.
– End_Position: Mutation end coordinate.
– Reference_Allele: The plus strand reference allele at this position. Includes the deleted sequence for a deletion or “-” for an insertion.
– Tumor_Seq_Allele2: Tumor sequencing discovery allele.
– Variant_Classification: Translational effect of variant allele. Can be one of the following: Frame_Shift_Del, Frame_Shift_Ins, In_Frame_Del, In_Frame_Ins, Missense_Mutation, Nonsense_Mutation, Silent, Splice_Site, Translation_Start_Site, Nonstop_Mutation, RNA, Targeted_Region.
– Variant_type: Type of mutation. Can be: ‘SNP’ (Single nucleotide polymorphism), ‘DNP’ (Double nucleotide polymorphism), ‘INS’ (Insertion), ‘DEL’ (Deletion).
– Tumor_Sample_Barcode: Sample name.
Select the response variable: Once the metadata file is uploaded, the user must select which variable will be used as grouping variable in all of the different analyses.
Gene set collections to be analysed by GSEA in a form of a .gmt file: Those files can be downloaded from the Molecular Signatures Database, MSigDB or be customly created following the corresponding [guidelines] (https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats). In the case of working with mouse data gmt files can be found here.
To use the CIBERSORT algorithm, the user need to register on the CIBERSORT web page (https://cibersort.stanford.edu), obtain a license and download the source code in form of two files CIBERSORT.R and LM22.txt. Then both files need to be uploaded in the corresponding space.
This section contains several parameters that need to be completed either as a drop-down list, an empty space to be filled in with text or with numeric values or as checkboxes. List and numeric parameters contain one of the options selected by default, whereas in the case of the text parameters the box contains a character string giving the user a hint as to what can be entered.
Genes ID (GE, IC): Name of the column that contains gene identifiers (entrezgene_id, ensembl_gene_id or hgnc_symbol).
Design formula (GE): A formula that expresses how the counts for each gene depend on the variables in the metadata (for example Cell + Treatment + Cell:Treatment)
Reference level (GE): Name of the grouping variable in the metadata that will be used as the reference to be compared against.
Colors: Indicate the color for each sample group separated by commas (GE, GV, IC).
Shrinkage method (GE): Name of the shrinkage method to apply within the DESeq2 algorithm. It can be either apeglm, ashr, normal or none. Default value is apeglm and sse none to skip shrinkage.
BiomaRt database (GE, IC): Data frame containing a biomaRt query with the following attributes: ensembl_gene_id, hgnc_symbol, entrezgene_id, transcript_length,refseq_mrna. Options are ensembl_biomartGRCh37 and ensembl_biomartGRCh38_p13 for Homo sapiens or ensembl_biomartGRCm38_p6 and ensembl_biomartGRCm39 for Mus musculus samples..
Fold Change (GE): An integer to define the fold change value to consider that a gene is differentially expressed. Default value is 2.
Adjusted p-value for gene expression data (GE): Numeric value to define the maximum adjusted p-value to consider that a gene is differentially expressed.
Adjusted p-value for GSEA (GE): Numeric value to define the adjusted pvalue cutoff in GSEA. Set to 0.2 by default.
Select genesets for GSVA (GE): List of options to choose whether the gene sets that will be used for GSVA are those from the Hallmark collection or the same that where indicated for GSEA.
GSVA method (GE): List of the methods available for GSVA. Either gsva, ssgsea or zscore.
Advanced plot options: Check boxes to decided whether column- or row-names or points should be added in specific plots.
Cancer types: TCGA Study Abbreviations (IC): List of TCGA study abbreviations. For more information visit the following link.
Select means comparison method (GV, IC): Methods used for comparing means between groups. Options are t.test(parametric) and wilcox.test (non-parametric) for two groups or anova (parametric) and kruskal.test (non-parametric) for more groups.
Number of genes for oncoplot (GV): Numeric value indicating the number of genes that will appear in the oncoplot.
Select genomic build (GV): Version of the genome to work with. Options are BSgenome.Hsapiens.UCSC.hg19 or BSgenome.Hsapiens.UCSC.hg38 for Homo sapiens and BSgenome.Mmusculus.UCSC.mm10 or BSgenome.Mmusculus.UCSC.mm39 for Mus musculus.
Select mutational signatures matrix (GV): Mutation matrices from [COSMIC]((https://cancer.sanger.ac.uk/sign atures/downloads/) for single (SBS) and double (DBS) base substitutions. Matrices from versions 2 and 3.2 are available for Homo sapiens and Mus musculus.
Once all the desired modules have been selected, the input files have been uploaded and the parameters have been set, it is time to click the execute button to start the analyses. Once the process start, a progress bar will appear in the lower right corner of the screen showing the processes that are being executed in the server.
This section contains the results of the differential gene expression analysis, which is calculated using the DESeq2 package. For GSEA, clusterProfiler package is used for the analysis and GSEAmining package and GSVA package are used for results summary and visualization.
The next section, entitled Differentially Expressed Genes, contains the rest of the analyses in different sub-tabs. There will be one tab per each group comparison, and their total number will depend on the total number of sample groups. In each case the following information will be shown:
GSEAmining package will be plotted. The first is a clustering of the top 20 more enriched gene sets in the analysis. The second plot will be a word cloud for each cluster highlighting the most enriched terms within the gene sets present in that cluster.Note: In the case there are no gene sets enriched with the p-value cutoff defined by the user, a message will be shown instead of the table and figures.
This section contains the results of the the analysis of genetic variations in the samples. Mutation summaries are calculated using the maftools package, whereas mutational signatures are predicted using the deconstructSigs package.
This section contains the predictions of immune cell composition in the tumour microenvironment from RNA-sequencing data. Predictions are made using the inmmunedeconv package, which include six prediction algorithms: QUANTISEQ, TIMER, MCP_COUNTER, XCELL, EPIC and CIBERSORT.